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Abstract — Situations in many fields of research, such as digital 
communications, nuclear physics and mathematical finance, can 
be modelled with random matrices. When the matrices get large, 
free probability theory is an invaluable tool for describing the 
asymptotic behaviour of many systems. It will be shown how free 
probability can be used to aid in source detection for certain 
systems. Sample covariance matrices for systems with noise are 
the starting point in our source detection problem. Multiplicative 
free deconvolution is shown to be a method which can aid in 
expressing limit eigenvalue distributions for sample covariance 
matrices, and to simplify estimators for eigenvalue distributions 
of covariance matrices. 

Index Terms — Free Probability Theory, Random Matrices, de- 
convolution, limiting eigenvalue distribution, MIMO, G-analysis. 



I. Introduction 

Random matrices, and in particular limit distributions of 
sample covariance matrices, have proved to be a useful tool 
for modelling systems, for instance in digital communications 
[1], [2], [3], [4], nuclear physics [5], [6] and mathemati- 
cal finance [7], [8]. A typical random matrix model is the 
information-plus-noise model, 



W„ = 4(R„ + aX„)(R„ + (tX,/. 



N 



(1) 



R„ and X n are assumed independent random matrices of 
dimension n X N throughout the paper, where X„ contains 
i.i.d. standard (i.e. mean 0, variance 1) complex Gaussian 
entries. ([TJ can be thought of as the sample covariance matrices 
of random vectors r n + crx„. r„ can be interpreted as a vector 
containing the system characteristics (direction of arrival for 
instance in radar applications or impulse response in channel 
estimation applications). x„ represents additive noise, with a 
a measure of the strength of the noise. Throughout the paper, 
n and N will be increased so that 



lim — = c, 

n — >oo ./V 



(2) 



i.e. the number of observations is increased at the same rate 
as the number of parameters of the system. This is typical of 
many situations arising in signal processing applications where 
one can gather only a limited number of observations during 
which the characteristics of the signal do not change. 
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The situation motivating our problem is the following: 
Assume that N observations are taken by n sensors. Observed 
values at each sensor may be the result of an unknown number 
of sources with unknown origins. In addition, each sensor is 
under the influence of noise. The sensors thus form a random 
vector r„ + crx„, and the observed values form a realization of 
the sample covariance matrix W„. Based on the fact that W„ 
is known, one is interested in inferring as much as possible 
about the random vector r n , and hence on the system ((TJ. 
Within this setting, one would like to connect the following 
quantities: 

1) the eigenvalue distribution of W„, 

2) the eigenvalue distribution of r„ = 

3) the eigenvalue distribution of the covariance matrix 
® n = E(r n r%). 

In [9], Dozier and Silverstein explain how one can use 2) to 
estimate 1) by solving a given equation. However, no algorithm 
for solving it was provided. In fact, many applications are 
interested in going from 1) to 2) when attempting to retrieve 
information about the system. Unfortunately, [9] does not 
provide any hint on this direction. Recently, in [10], we show 
that the framework of [9] is an interpretation of the concept 
of multiplicative free convolution. Moreover, [10] introduces 
the concept of free deconvolution and provides an estimate of 
2) from 1) in a similar way as estimating 1) from 2). 

3) can be adressed by the G2-estimator [11], which provides 
a consistent estimator for the Stieltjes transform of covariance 
matrices, the basis for the estimation being the Stieltjes 
transform of sample covariance matrices. G-estimators have 
already shown their usefulness in many applications [12] but 
still lack intuitive interpretations. In [10], we also show that 
the G2 -estimator can be derived within the framework of 
multiplicative free convolution. This provides a computational 
algorithm for finding 2). Note that 3) can be found directly, 
without finding 2) as demonstrated in [13]. However, the 
latter does not provide a unified framework for computing the 
complete eigenvalue distribution but only a set of moments. 

Beside the mathematical framework, we also address im- 
plementation issues of free deconvolution. Interestingly, mul- 
tiplicative free deconvolution admits a convenient implemen- 
tation, which will be described and demonstrated in this 
paper. Such an implementation will be used to address several 
problems related to signal processing. For communication 
systems, estimation of the rank of the signal subspace, noise 
variance and channel capacity will be addressed. 

This paper is organized as follows. Section [TT] presents the 
basic concepts needed on free probability, including mul- 
tiplicative and additive free convolution and deconvolution. 
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Section [TTT] states the results for systems of type ©. In 
particular, finding quantities 2) and 3) from quantity 1) will 
be addressed here. Section [IV] presents implementation issues 
of these concepts. Section [V] will explain through examples 
and simulations the importance of the system © for digital 
communications. In the following, upper (lower boldface) 
symbols will be used for matrices (column vectors) whereas 
lower symbols will represent scalar values, (.) T will denote 
transpose operator, (.)* conjugation and (.) H = ((-) T ) 
hermitian transpose. I will represent the identity matrix. 

II. Framework for free convolution 

Free probability [14] theory has grown into an entire field 
of research through the pioneering work of Voiculescu in 
the 1980's [15] [16] [17] [18]. The basic definitions of free 
probability are quite abstract, as the aim was to introduce 
an analogy to independence in classical probability that can 
be used for non-commutative random variables like matrices. 
These more general random variables are elements in what 
is called a noncommutative probability space. This can be 
defined by a pair (A, 0), where A is a unital ^-algebra with 
unit 7, and is a normalized (i.e. (f)(1) — 1) linear functional 
on A. The elements of A are called random variables. In all 
our examples, A will consist of n x n matrices or random 
matrices. For matrices, will be the normalized trace tr n , 
defined by (for any a € A) 

1 1 " 

tr n (a) = —Tr(a) = - V" an, 
n n z — ' 

i=i 

while for random matrices, will be the linear functional r n 
defined by 

1 " 

T n (a) = - E(a u ) = E(tr n (a)). 
n 

i=l 

The unit in these ^-algebras is the n x n identity matrix I n . 
The noncommutative probability spaces considered will all be 
tracial, i.e. 4> satisfies the trace property 0(a6) = <p(ba). The 
analogy to independence is called freeness: 

Definition 1: A family of unital *-subalgebras (A,)igj will 
be called a free family if 

f a 3 e A h | 

< ii ^ i 2 , 12 ^ h, ■ ■ ■ I hi-1 7^ hi > => 4>(ai ■ ■ ■ o n ) = 0. 
I 0(oi) = 0(o 2 ) = ■ ■ ■ = 4>{a n ) = I 

(3) 

A family of random variables Oj is called a free family if the 
algebras they generate form a free family. 

One can note that the condition i\ ^ i n is not included in 
the definition of freeness. This may seem strange since if 
is a trace and i\ = i n , we can rearrange the terms so that 
two consecutive terms in © come from the same algebra. If 
this rearranged term does not evaluate to zero through the 
definition of freeness, the definition of freeness would be 
inconsistent. It is not hard to show that this small issue does 
not cause an inconsistency problem. To see this, assume that 
© is satisfied for all indices where the circularity condition 



i\ i n is satisfied. We need to show that © also holds for 
indices where i\ = i n . By writing 

a n ai = (a n ai-(f)(a n ai)I)+(j)(a n ai)I = bi+(f>(a n ai)I, (4) 

we can express <fi(a,i ■ ■ ■ a n ) — 4>(a n a\a2 ■ • ■ o„_i) as a sum 
of the two terms <j)(b\a2 ■ ■ ■ a n -i) and 4>(a n ai)4>(a 2 ■ ■ ■ a n -\). 
The first term is by assumption, since </>(&i) = 0, b\ S Ai n 
and i n ^ i n -v Th e second term (f>(a n ai)(j)(a2 ■ ■ ■ o n _i) 
contributes with zero when %2 ^ i n -i by assumption. If 
*2 = in-ii we use the same splitting as in (0]i again, but this 
time on <p(a2 ■ ■ ■ a n -i) = 0(o n _iO2O3 • • • a n ^. 2 ), to conclude 
that (f)(a2 ■ ■ ■ a n -i) evaluates to zero unless 13 = i n -2- 
Continuing in this way, we will eventually arrive at the term 

< M a n/2 a 7i/2+i) if n i s even ^ or the term < M a (n+i)/2) if n is 
odd. The first of these is since i„/ 2 ^ i n /2+i, an d the second 
is by assumption. 

Definition 2: We will say that a sequence of random vari- 
ables a n i,a n2 , ... in probability spaces (A n ,<f) n ) converge in 
distribution if, for any mi, m r £ Z, ki, k r £ {1, 2, ...}, 
we have that the limit 4>n( a ^ki ' ' ' a ™fe ) ex ists as n — > 00. 
If these limits can be written as 4>{a™^ ••' a fe Ir ) f° r some 
noncommutative probability space (A, </>) and free random 
variables 01, 02, ... £ (A, </>), we will say that the a„i, a„2, ■■■ 
are asymptotically free . 

Asymptotic freeness is a very useful concept for our pur- 
poses, since many types of random matrices exhibit asymptotic 
freeness when their sizes get large. For instance, consider 
random matrices iA„i, ^A„2, where the A n j are nxn 
with all entries independent and standard Gaussian (i.e. mean 
and variance 1). Then it is well-known [14] that the -3=A nj are 
asymptotically free. The limit distribution of the -4= A n i in this 
case is called circular, due to the asymptotic distribution of 
the eigenvalues of -y^Anf. When n — ► 00, these get uniformly 
distributed inside the unit circle of the complex plane [19], 
[20]. 

© enables us to calculate the mixed moments of free 
random variables ai and 0,2- In particular, the moments of 
di + ct2 and a\a 2 can be calculated. In order to calculate 
4>({a\ + a2) A ), multiply out (ai + CI2) 4 , and use linearity and 
© to calculate all cp^a^a^a^a^) (ij = 1, 2). For example, 
to calculate 0(010,2(1102), write it as 

<f>( ((01 - 0(oi)J) + 0(oi)7) ((o 2 - 0(o 2 )I) + 0(o 2 )I) 
((oi - 0(oi)I) + 0(oi)/) ((o 2 - 4>{a 2 )I) + 0(o 2 )/)), 
and multiply it out as 16 terms. The term 

0((ai - 0(ai)/)(o 2 - 0(02) -0 

(oi - (j)(ai)I)(a 2 - (t>(a 2 )I)) 

is zero by ©. The term 

0((oi - 0(ai)7)0(o2)/(oi - 0(oi)7)(o 2 - 0(o2)7)) 
= 0(o 2 )0((ai - 0(ai)7)(oi - 0(oi)7)(a 2 - 0(a 2 )7)) 

can be calculated by writing 

b = (oi - 0(oi)7)(oi - 0(oi)7) 
(which also is in the algebra generated by ai), setting 
b = (b - 0(6)7) + 0(6)7, 
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and using (0 again. The same procedure can be followed for 
any mixed moments. 

When the sequences of moments uniquely identify proba- 
bility measures (which is the case for compactly supported 
probability measures), the distributions of a% + ci2 and 0402 
give us two new probability measures, which depend only 
on the probability measures associated with the moments of 
01, ci2- Therefore we can define two operations on the set of 
probability measures: Additive free convolution 



fix EH fi 2 



(5) 



for the sum of free random variables, and multiplicative free 
convolution 

Hi B H2 (6) 

for the product of free random variables. These operations 
can be used to predict the spectrum of sums or products of 
asymptotically free random matrices. For instance, if a,\ n has 
an eigenvalue distribution which approaches \i\ and 02 n has 
an eigenvalue distribution which approaches pi, one has that 
the eigenvalue distribution of a% n +a2 n approaches /iiffl/i2, so 
that fii EH fi2 can be used as an eigenvalue predictor for large 
matrices. Eigenvalue prediction for combinations of matrices 
is in general not possible, unless we have some assumption on 
the eigenvector structures. Such an assumption which makes 
random matrices fit into a free probability setting (and make 
therefore the random matrices free), is that of uniformly 
distributed eigenvector structure (i.e. the eigenvectors point 
in some sense in all directions with equal probability). 

We will also find it useful to introduce the concepts of 
additive and multiplicative free deconvolution: 

Definition 3: Given probability measures /1 and /i 2 - When 
there is a unique probability measure pi such that 



fj, = p,i iti p 2 , M = Ml 
we will write 



1 p 2 respectively, 



/Zi = p B /i 2 , \i\ = /iH/i2 respectively. 

We say that [i\ is the additive free deconvolution (respectively 
multiplicative free deconvolution) of /i with /i 2 . 
It is noted that the symbols presented here for additive and 
multiplicative free deconvolution have not been introduced in 
the literature previously. With additive free deconvolution, one 
can show that there always is a unique \i\ such that p = 
pi EH /i2. For multiplicative free deconvolution, a unique p\ 
exists as long as we assume non-vanishing first moments of 
the measures. This will always be the case for the measures 
we consider. 

Some probability measures appear as limits for large ran- 
dom matrices in many situations. One important measure is the 
Marchenko Pastur law p c ([21] page 9), also known as the free 
Poisson distribution in free probability. It is characterized by 
the density 



(1 --)+*(*) + ^ 
c 



o)+(6- 



2lTCX 

where (z)+ = max(0, z), a = (1 — ^fc) 2 and b = (1 + ^fc) 2 . In 
figure Q] p c is plotted for some values of c. It is known that 



(7) 



c=0.5 

c=0.2 

c=0.1 - 
- - c=0.05 




Fig. 1. Different Marchenko Pastur laws fi c 



p c describes asymptotic eigenvalue distributions of Wishart 
matrices. Wishart matrices have the form ^RR 5 , where R 
is annxJV random matrix with independent standard Gaussian 
entries. p c appears as limits of such when 4n> — * c when n — > 
00, Note that the Marchenko Pastur law can also hold in the 
limit for non-gaussian entries. 

We would like to describe free convolution in terms of 
the probability densities of the involved measures, since this 
provides us with the eigenvalue distributions. An important 
tool in this setting is the Stieltjes transform ([21] page 38). 
For a probability measure p, this is the analytic function on 
C+ = {z e C : Imz > 0} defined by 

m^z) = J ^— z dF»{\), (8) 

where is the cumulative distribution function of /i. All 
/j, we will consider have support on the positive part of the 
real line. For such p, can be analytically continued to the 
negative real line, where the values of m M are real. If [i has 
compact support, we can expand m^(z) in a Laurent series, 
where the coefficient are the moments fik of ji: 



00 



z * — ' z" 

k=0 



(9) 



A convenient inversion formula for the Stieltjes transform also 
exists: We have 



/ M (A) = lim —Im{mJ\ + juj)] 

lo^0+ IT 



III. Information plus noise model 



(10) 



In this section we will indicate how the quantities 2) and 3) 
can be found. The connection between the information plus 
noise model and free convolution will be made. 

A. Estimation of the sample covariance matrix 2) 

In [10], the following result was shown. 
Theorem 1: Assume that T n = ■^■R„R,^ converge in 
distribution almost surely to a compactly supported probability 
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measure fir- Then we have that W„ also converge in dis- 
tribution almost surely to a compactly supported probability 
measure nw uniquely identified by 

MvkHMc = (MrHMc) EH Mo- 2 j- (H) 
Theorem Q] addresses the relationship from 2) to 1), since 
( fTTT i can be "deconvolved" to the following form: 

Mw = ((MrHMc) EQ Mct 2 j) ^ Me- (12) 

It also addresses the relationship from 1) to 2), which is of 
interest here, through deconvolution in the following form: 

fir = (((jtwRfic) B/V 2 /) Elfic- (13) 



10i ■ ■ ■ 1 10. ■ ■ ■ 1 

8 8 

6 6 

4 4 

I i iiiiii i I I LiLu 

0.5 1 1.5 2 0.5 1 1.5 2 

(a) N = 64 (b) N = 512 

Fig. 2. Histograms of eigenvalues of sample covariance matrices. The 
covariance matrix has rank K = 8. We choose different number of sensors 
N, and choose c = 0.5 (so that L = 2N observations are taken). 



B. Estimation of the covariance matrix 3) 

General statistical analysis of observations, also called G- 
analysis [22] [12] is a mathematical theory studying complex 
systems, where the number of parameters of the considered 
mathematical model can increase together with the growth of 
the number of observations of the system. The mathematical 
models which in some sense approach the system are called 
G-estimators, and the main difficulty in G-analysis is to 
find consistent G-estimators. We use N for the number of 
observations of the system, and n for the number of parameters 
of the mathematical model. The condition used in G-analysis 
expressing the growth of the number of observations vs. the 
number of parameters in the mathematical model, is called 
the G-condition. The G-condition used throughout this paper 
is ©. 

We restrict our analysis to systems where a number of 
independent random vector observations are taken, and where 
the random vectors have identical distributions. If a random 
vector has length n, we will use the notation 0„ to denote the 
covariance. Girko calls an estimator for the Stieltjes transform 
of covariance matrices a G2-estimator. In chapter 2.1 of [11] 
he introduces the following expression as candidate for a G2- 
estimator: 

G 2 , n (z) = ^-m^ n (e(z)), (14) 

where the term m^ n (6(z)) = jT x Tr jr„ - 6(z)I n \ . The 
function 9(z) is the solution to the equation. 

e(z)cm w 0{z)) - (1 - c) + ^ = 0. (15) 

z 

Girko claims that a function G2. n (z) satisfying ( TT3T > and ( fl4l i 
is a good approximation for the Stieltjes transform of the in- 
volved covariance matrices (z) = —Tr {0 n — zl n } . 
He shows that, for certain values of z, G2, n (z) gets close to 
TO /ie ( z ) wnen ti — > 00. 

In [10], the following connection between the G2 -estimator 
and multiplicative free convolution is made: 

Theorem 2: For the G2 -estimator given by (fl4l . ( fl5l ). the 
following holds for a nonempty open set in G + : 

G 2 , n {z) =TO Alrn H Alc (16) 
Theorem [2] shows that multiplicative free convolution can 
be used to estimate the covariance of systems. This addresses 



the problem of estimating quantity 3). Hence, the estimation 
of quantity 2) and 3) can be combined, since ( fT3l can be 
rewritten to 

firSfi c = (/wH/Uc) B /V 2 /. (17) 

Therefore, the following procedure needs to take place to 
estimate quantity 3) 

1 - perform multiplicative free deconvolution of the mea- 
sure associated with W„ and the marchenko pastur law 
using the G2 -estimator. 

2 - perform additive free deconvolution with fi„ii. In other 
words, perform a shift of the spectrum. 

In section IV-BI these steps are performed in the setting of 
channel correlation estimation. The combinatorial implemen- 
tation of free deconvolution from section IIV-A.1I is used to 
compute the G2 -estimator. 

We plot in figure [2] histograms of eigenvalues for various 
sample covariance matrices when the rank is K = 8. As one 
can see, if the number of sensors (N) are chosen much larger 
than the number of signals K, the eigenvalues corresponding 
to the signals only make up a small portion of the entire 
set of eigenvalues. If one has information on the number 
of impinging signals, it can therefore be wise to choose the 
appropriate number of sensors. 

In this paper, the difference between a probability measure, 

and an estimate of it, v, will be measured in terms of the 
Mean Square Error of the moments (MSE). If the moments 
of J x k dfi(x), J x k dv(x) are denoted by /i^, v^, respectively, 
the MSE is defined by 

$>ife-^| 2 (18) 

k<n 

for some number n. In our estimation problems, the measure 
v we test which gives the minimum MSE (MMSE) will be 
chosen. 

The measure \x can either be given explicitly in terms of 
the distribution of matrices R n (for which the measure is 
discrete and the moments are given by Uk = tr n (R%)), or 
random matrices, or it can be given in terms of just the 
moments. In a free probability setting, giving just the moments 
is often convenient, since free convolution can be viewed as 
operating on the moments. Since the G2-estimator uses free 
deconvolution, it will be subject to a Mean Square Error of 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 1, JANUARY 2007 



5 



1 

0.8 
j 0.6 
' 0.4 

0.2 




100 200 300 
N 

(a) 4 moments 



1| 

j 0.6 
) 

■ 0.4 

0.2 ... 
0- 



100 200 300 400 500 
N 



(b) 8 moments 



Fig. 3. MMSE of the first moments of the covariance matrices, and the 
first moments of the G2 estimator of the sample covariance matrices. The 
covariance matrices all have distribution i<5o + §"5l. Different matrix sizes 
N are tried. The value c = 0.5 is used. 



moments analysis. We will compute the MSE for different 
number of moments, depending on the availability of the 
moments. 

In figure [3j a covariance matrix with density + \&i 
has been estimated with the the G2-estimator. Sample co- 
variance matrices of various sizes are formed, and method 
A in section IIV-A.1I was used for the free deconvolution in 
the G2 -estimator. Finally, MSE of the first 4 and 8 moments 
were computed. It is seen that the MSE decreases with the 
matrix sizes, which confirms the accuracy of the G2-estimator. 
Also, the MSE is much higher when more moments are 
included. This is as expected, when we compare known exact 
expressions for moments of Gaussian random matrices [23], 
with the limits these converge to. 

Since the MSE is typically higher in the higher moments, 
we will in the simulations in this paper minimize a weighted 
MSE: 

X>fcK-^i 2 , (19) 

fc 

instead of the MSE in < fT~8b . Higher moments should have 
smaller weights Wk, since errors in random matrix models 
are typically larger in the higher moments. There may not 
be optimal weights which work for all cases. For the cases 

/2fe\ 

in this paper, the weights are w 2 k = I k I and w 2 k+i = 0, 

whick coincide with the Catalan numbers Ck- These weights 
are motivated from formulas for moments of Gaussian random 
matrices in the way explained below, and are used in this paper 
since the models we consider often involve Gaussian matrices. 

Moment 2k of a standard selfadjoint Gaussian matrix X n 
of size ft x ft satisfies [14] [24] 



lim T n (X 2 n k ) = C k . 

n — >oo 

Also, exact formulas for r n (X^ fc ) (4.1.19 in [14]) exist, where 
the proof uses combinatorics and noncrossing partitions (see 
section IIV-A.U . with emphasis on the quantity Ck being 
the number of noncrossing partitions of {1, ...,2k} where 
all blocks have cardinality two. From the exact formula for 
the moment of r„(X^ fe ), one can also see the difference 
between the limit as ?i — ► oo. This difference is approximately 
■nT 1 x card(S), where S is another set of partitions. Although 
this set of partitions is not the same as the noncrossing 
partitions, it can in some sense be thought of as "partitions 



where limited crossings may be allowed". The choice of Ck 
as weight is merely motivated from the belief that S possibly 
has similar properties as the noncrossing partitions, and that 
possibly the cardinality has a similar expression. 

In summary, figure [3] shows that for large matrices, the G 2 - 
estimator gets close to the actual covariance matrices although 
several sources can give contribution to errors: 

1) The sample covariance matrix itself is estimated, 

2) the estimator itself contributes to the error, 

3) the implementation of free deconvolution also con- 
tributes to the error. 

IV. Computation of free convolution 

One of the challenges in free probability theory is the 
practical computation of free convolution. Usual results exhibit 
asymptotic convergence of product and sum of measures, but 
do not explicitly provide a framework for computing the result. 
In this section, given two compactly supported probability 
measures, we will sketch how their free (de)convolved coun- 
terparts can be computed. In the cases we are interested (signal 
impaired with noise), the Marchenko Pastur law /i c will be one 
of the operand measures, while the other will be a discrete 
measure, i.e. with density 



,PiS\i(x), 



(20) 



i=i 



where pi is the mass at A^, and YliPi = 1- All Xi > 0, 
since only measures with support on the positive real line 
are considered (n x n sample covariance matrices have such 
eigenvalue distributions). This would be the distribution we 
observe in a practical scenario: Since a finite number of 
samples are taken, we only observe a discrete estimate of the 
sample covariance matrix. 

The Stieltjes transform m flc □ M can be found exactly for z 
on the negative real line by solving function equations [10], 
but one has to perform analytical continuation to the upper 
half of the complex plane prior to using the Stieltjes inversion 
formula. Indeed, note that since the power series (0 is only 
known to converge for z outside a disk with radius equal to 
the spectral radius, partial sums of (O can not necessarily be 
used to approach the limit in <TT0b . However, one can show 
that values of ra^(z) for z on the negative real line can be 
approximated by analytically continuing partial sums of 

When /j is discrete, one can show that solving the function 
equations boils down to finding the roots of real polynomials 
(see section HV-Bb . which then must be analytically continued. 
We will sketch a particular case where this can be performed 
exactly. We will also sketch two other methods for computing 
free convolution numerically. One method uses a combina- 
torial description, which easily admits an efficient recursive 
implementation. The other method is based on results on 
asymptotic freeness of large random matrices. 

A. Numerical Methods 

1) Method A: Combinatorial computation of free convolu- 
tion: The concept we need for computation of free convolution 
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presented in this section is that of noncrossing partitions [24]: 

Definition 4: A partition tt is called noncrossing if when- 
ever we have i < j < k < I with i ~ k, j ~ I (~ meaning 
belonging to the same block), we also have i ~ j ~ k ~ Z 
(i.e. i, j, fc, Z are all in the same block). The set of noncrossing 
partitions of {1, , , ., n} is denoted NC(n). 
We will write it = {B\, B r } for the blocks of a partition. 
\Bi\ will mean the cardinality of the block Bi. 

Additive free convolution. 

A convenient way of implementing additive free convolution 
comes through the moment-cumulant formula, which expresses 
a relationship between the moments of the measure and the 
associated i?-transform ([21] page 48). The i?-transform has 
domain of definition C + and can be defined in terms of the 
Stieltjes transform as 

M z ) = ™?{-*) - (21) 

The importance of the i?-transform comes from the additivity 
property in additive free convolution, 

a / , 1 Bj Ma (z) = :M*) + ( 22 ) 

Slightly different versions of the i?-transform are encountered 
in the literature. The definition (fJTJ is from [21]. In connection 
with free combinatorics, another definition is used, namely 
Rfj,(z) = zOlfj,(z). Write Rp(z) = J2 n ct n z n . The coefficients 
a n are called cumulants. The moment-cumulant formula says 
that 

k 

Vn= Y2 \\. a \BiV (23) 

7T={Bi,— : B k }£NC(n) »=1 

From (l23l it follows that the first n cumulants can be com- 
puted from the first n moments, and vice versa. Noncrossing 
partitions have a structure which makes them easy to iterate 
over in an implementation. One can show that (l23l can be 
rewritten to the following form suitable for implementation: 

= ^ a kCoef n - k ((1 + fitz + n 2 z 2 H ) k ) . (24) 

k<n 

Here coefk means the coefficient of z . Computing the 
coefficients of (1 + \i\z + fi2Z 2 + ■ ■ ■ ) h can be implemented 
in terms of a fc-fold discrete (classical) convolution, so that 
the connection between free and classical convolution can be 
seen both in terms of concept and implementation. (f24t can 
be implemented in such a way that the /z„ are calculated 
from a n , or, the other way around, the a n are calculated 
from the /i n . When computing higher moments, d24T i is quite 
time-consuming, since many (classical) convolutions of long 
sequences have to be performed. A recursive implementation 
of d24T > was made for this paper [25], and is briefly described 
in appendix U The paper [26] goes through implementation 
issues of free convolution in more detail. 

Additive free convolution in terms of moments of the 
involved measures can therefore beimplemented through the 
following steps: Evaluate cumulants using (124-b for the two 
measures, add these, and finally evaluate moments using d24"i i 
also. 



Multiplicative free convolution. 

The combinatorial transform we need for multiplicative free 
convolution is that of boxed convolution [24] (denoted by [*]), 
which can be thought of as a convolution operation on formal 
power series. The definition uses noncrossing partitions and 
will not be stated here. One power series will be of particular 
importance to us. The Zeta-series is intimately connected to 
Hi in that it appears as it's i?-transform. It is defined by 

Zeta(z) = ^ z l . 

i 

Zeta(z) has an inverse under boxed convolution, 

00 

M e6(^) = ^(-l)"- 1 C n _ 1 ^, 
n=l 

also called the Mobius series. Here (C„)^L 1 is the sequence 
of Catalan numbers (which are known to be related to the 
even moments of Wigner matrices). Define the moment series 
of a measure fi by 

00 1 1 

M{n){z) = V n k z k = — m At (-) - 1. 

z z 

k=l 

One can show that (l23l is equivalent to M(/i) = R(fi)^Zeta. 
One can in fact show that boxed convolution on power 
series is the combinatorial perspective of multiplicative free 
convolution on measures. Also, 

1) boxed convolution with the power series c n ~ 1 Zeta 
represents convolution with the measure [i c , 

2) boxed convolution with the power series c n ~ 1 Moeb 
represents deconvolution with the measure /i c . 

This is formalized as 

M^ a = M^{c n - l Zeta), 

and can also be rewritten to 

cM^ r = Zetdft(cMJ, (25) 

It can be shown that this is nothing but the moment-cumulant 
formula, with cumulants replaced by the coefficients of cM M , 
and moments replaced by the coefficients cM„g|„ e . Therefore, 
the same computational procedure can be used for passing 
between moments and cumulants, as for passing between the 
moments series of /1 M \i c and that of /i, the only difference 
being the additional scaling of the moments by c: 

1) multiply all input moments by c prior to execution of 
(El, 

2) divide all output moments by c after execution of d24"l >. 
The situation for other compactly supported measures than /i c 
follows the same lines, but with c n ~ 1 Zeta and c n ~ 1 Moeb 
replaced with other power series. Convolution and deconvolu- 
tion with other measures than /i c may be harder to implement, 
due to the particularly simple structure of the Zeta series. 

In addition to computing (l24l and performing step 1) and 
2) above, we need first to obtain the moments of fi in some 
way. For fi as in (|20| > the moments can be calculated by 
incrementally computing the numbers (A™,..., A™), adding 
these together and normalizing. At the end, we may also need 
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to retrieve the probability density from the computed moments. 
If a density corresponds to the eigenvalue distribution of some 
matrix, the Newton-Girard Formulas [27] can be used to re- 
trieve the eigenvalues from the moments. These formulas state 
a relationship between the elementary symmetric polynomials 



n^Ai,...^) 



E 

i i < ■ ■ ■ < ij < n 



Ki ■ • • At 



and the sums of the powers of their variables 



Sp(\i, A n ) 



through the recurrence relation 



E ^ 

Ki<n 



(26) 



(27) 



(-l) m mn m (Ai,...,A„) 

+ E'feLi(-l) fc+m S fc (Ai,...,A„)n m _ fe (Ai,...,A„) = o. 

(28) 

If S p (Xi, A n ) are known for 1 < p < n, (l28| > can be used 
repeatedly to compute II m (Ai, A„), 1 < m < n. 
Coefficient n — k in the characteristic polynomial 

(A - Ai) • • • (A - A n ) 

is (— l) fe ITfc(Ai, A n ), and these can be computed from 
Sft(Ai, A„) using d28l . Since <Sfc(Ai, A n ) = nrrik (with 
nrik being the kth moment), the entire characteristic polynomial 
can be computed from the moments. Hence, the eigenvalues 
can be found also. 

In general, the density can not be written as the eigenvalue 
distribution of a matrix, but the sketched procedure can still 
provide us with an estimate based on the moments. Intuitively, 
the approximation should work better when more moments 
are involved. The simulations in this paper use the sketched 
procedure only for a low number of moments, since mostly 
discrete measures with few atoms are estimated. We have thus 
also avoided issues for solving higher degree characteristic 
equations with high precision. 

2) Method B: Computation of free convolution based on 
asymptotic freeness results: As mentioned, the Marchenko 
Pastur law can be approximated by random matrices of the 
form r„ = -i-R„R^, where R„ isnxJV with i.i.d. standard 
Gaussian entries. It is also known that the product of such a T n 
with a (deterministic) matrix with eigenvalue distribution /i has 
an eigenvalue distribution which approximates that of /i c [x] /i. 
This is formulated in free probability as a result on asymptotic 
freeness of certain random matrices with deterministic matri- 
ces [14]. Therefore, one can approximate multiplicative free 
convolution by taking a sample from a random matrix T n , 
multiply it with a deterministic diagonal matrix with eigen- 
value distribution p, and calculating the eigenvalue distribution 
of this product. The deterministic matrix need not be diagonal. 
Additive free convolution can be estimated in the same way by 
adding T n and the deterministic matrix instead of multiplying 
them. 

In figure [4] method B is demonstrated for various matrix 
sizes to obtain approximations of (^S + /i c for c = 

0.5. The moments of the approximations are compared with 
the exact moments, which are obtained with method A. The 
Mean Square Error of the moments is used to measure the 
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Fig. 4. MMSE of the first moments of (^<5q + E3 fi c , and the same 

moments computed approximately with method B using different matrix sizes 
N. The value c = 0.5 is used. 
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Fig. 5. Approximated densities of 
method B for various values of c 
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difference. As in figure [3] it is seen that the MSE decreases 
with the matrix sizes, and that the MSE is much higher when 
more moments are included. 

Another interesting phenomenon occurs when we let c go 
to 0, demonstrated in figure [5] for the measure /i with 



(29) 



It is seen that for small c, the support of \x c seems to 
split into disjoint components centered at the dirac locations. 
This is compatible with results from [28]. There it is just 
noted that, for a given type of systems, the support splits into 
TWO different components, and the relative mass between 
these components is used to estimate the numbers of signals 
present in the systems they consider. In figure [5] a matrix of 
dimension N x N with N = 1536 and eigenvalue distribution 
fj, is taken. This matrix is multiplied with a Wishart matrix 
■i-XX ff , where X has dimension N x L with j- = c with 
decreasing values of c. It is seen that the dirac at 1 is split from 
the rest of the support in both plots, with the split more visible 
for the lower value of c. The splitting of the two other dirac s 
from each other it not very visible for these values of c. Also, 
the peaks in the density of fx fi c occur slightly to the left 
of the dirac points, which is as expected from the comments 
succeeding theorem [3] 

A partial explanation for the fact that \i Kl fi c in some 
sense converges to fj, when c — * is given by combining 
the following facts: 

• The sample covariance matrix converges to the true 
covariance matrix when the number of observations tend 
to oo (i.e. c — > 0). 
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• The G2 -estimator for the covariance matrices is given by 

multiplicative free deconvolution with fi c . 
In summary, the differences between method A and B are 
the following: 

1 ) Method B needs to compute the full eigenvalue distribu- 
tion of the operand matrices. Method A works entirely 
on the moments. 

2) With method B, the results are only approximate. If the 
eigenvalues are needed also, method A needs to per- 
form computationally expensive tasks in approximating 
eigenvalues from moments, for instance as described in 
section IIV-A.1I 

3) Method B is computationally more expensive, in that 
computations with large matrices are needed in order to 
obtain accurate results. Method A is scalable in the sense 
that performance scales with the number of moments 
computed. The lower moments are the same regardless 
on how many higher moments are computed. 

The two methods should really be used together: While 
method A easily can get exact moments, method B can tell us 
the accuracy of random matrix approximations by comparison 
with these exact moments. 

The simulations in this paper will use method A, since 
deconvolution is a primary component, and since we in 
many cases can get the results with an MSE of moments 
analysis. Deconvolution with method B should correspond to 
multiplication with the inverse of a Wishart matrix, but initial 
tests do not suggest that this strategy works very well when 
predicting the deconvolved eigenvalues. 

The way method A and method B have been described here, 
they have in common that they only work for free convolution 
and deconvolution with Marchenko Pastur laws. Method A 
worked since d24l) held for such laws, while method B worked 
since these laws have a convenient asymptotic random matrix 
model in terms of Gaussian random matrices. 

B. Non-numerical methods: Exact calculations of multiplica- 
tive free convolution 

Computation of free convolution in general has to be 
performed numerically, for instance through the methods in 
section lTV-A.ll and lIV-A.2l In some cases, the computation can 
be performed exactly, i.e. the density of the (de)convolutions 
can be found exactly. Consider the specific case of d20l i where 



f»(x) = (l-p)6 (x)+p5 x (x), 



(30) 



where p < 1, A > 0. Such measures were considered in [13], 
where deconvolution was implemented by finding a pair (p, A) 
minimizing the difference between the moments of \i \i c , 
and the moments of observed sample covariance matrices. 
Exact expressions for the density of fi Kl \l c were not used, 
all calculations were performed in terms of moments. 

( f30l > contains one dirac (i.e. p = 1) as a special case. It is 
clear that multiplicative free convolution with S\ has an exact 
expression, since we simply multiply the spectrum with A (the 
spectrum is scaled). As it turns out, all fi of the form ( 130b give 
an exact expression for /i Kl /i c . In appendix lU the following 
is shown: 




(a) c = 0.5 



(b) c = 0.25 



25 



BE 

0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5 

(c) c = 0.5. L = 1024 observations (d) c = 0.25. L = 2048 observations 

Fig. 6. Densities of (^<5o + \ <5l) Kl /j c (upper row), and corresponding 
histogram of eigenvalues for the sample covariance matrices for different 
number of observations (lower row) 



Theorem 3: The density of /i H /i c is outside the interval 

h,c, P = [A(l + cp) - 2\^p-, A(l + cp) + 2A^] , (31) 



while the density on I\, c ,p is given by 



f^(x) = 



2c\xir 



(32) 



where 



K x {x) = x - A(l + cp) + 2\i/cp 
K 2 {x) = A(l + cp) + 2\Jcp- x. 



The density has a unique maximum at x = A 



(1-cp) 2 
1+cp 



with 



value — — ; -. 

C7TA{1— Cp) 

The importance of theorem[3]is apparent: The mass of /iS/i c 
is seen to be centered on A(l + cp), with support width of 
4Ay / cp. If we let c go to zero, the center of mass approaches A 
and the support width approaches zero. We note that the center 
of the support of /il/i c is slighly perturbed to the right of 
A, while the density maximum occurs slightly to the left of A. 
It is easily checked that the support width and the maximum 
density uniquely identifies a pair (p, A). This means that, if 
we have an estimate of the density of /i Kl /i c (for instance in 
the form of a realization of a sample covariance matrix) for 
a measure /i of the form (13 01) . the maximum density and the 
support width give us a good candidate for the (p, A) defining 
fj,. Figure [6] shows densities of some realizations of /i fi c for 
V = \ an d A = 1, together with corresponding realizations 
of covariances matrices. Values c = 0.25 and c = 0.5 were 
used, with L = 1024 and L — 2048 observations respectively. 
Covariance matrices of size 512 x 512 were used. 

A similar result to theorem [3] characterizing /iE3/i c is proved 
in appendix |nl| 
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Theorem 4: The density of /j,Q/i c is outside the interval 



Jx.cp = [A(l - 2cp) - 2X^ /cp(l - cp) 
A(l - 2cp) + 2X v /cp{l- cp)], 

while the density on J\, c ,p is given by 



y jL 1 (x)L 2 {x) 
2cx 2 



(33) 



(34) 



where 



Lx(x) = x - A(l - 2cp) + 2A^cp(l - cp) 
L 2 (aO = A(l - 2cp) + 2Av/cp(l - cp) - x. 

The support in this case is centered on A(l — 2cp), which 
is slightly to the left of A, contrary to the case of convolution. 
The support width is AX^J cp(l — cp). Also in this case it is 
easily seen that the support narrows and gets centered on A, as 
c goes to 0. The densities in (l32b and d34l > are seen to resemble 
the density of p, c in (7). One difference is that /i c is centered 
on 1, while the densities in d32i > and d34l > need not be. 

The proofs of theorem [3] and [4] build on an analytical 
machinery for computing free convolution, where several 
transforms play a role. Besides the Stieltjes transform m v (z), 
one has the ^-transform, defined for real z > 



1 + zX 



dF v {X). 



It is easy to verify that m„ can be analytically continued to 
the negative part of the real line, and that 



H) 



(35) 



for z > 0. The ^-transform has some nice properties, it is for 
instance strictly monotone decreasing, so that it has an inverse. 
In [10] it was shown that 



vJ( z ) 



(36) 



The proof of this involved some more transforms, like the S- 
transform, and the actual value of these transforms for /i c . The 
following forms will be more useful to us than ( f36b . and can 
be deduced easily from it: 



^ (z(l - c + cj]^ c ( z ))) = V^p c (z) 



and 



1 - c + cq^^z) 
for [i as in (l30l is easily calculated: 



7^0) = l-p + 



P 



1 + zX' 



(37) 



(38) 



(39) 



/i Kl /i c with unconnected support components centered at 
the dirac locations of \i may very well happen for discrete 
measures with more than two atoms also, but we do not 
address this question here. The more general case, even when 
there are two dirac 's away from 0, does not admit a closed- 
form solution, since higher degree equations in general can 
not be solved using algebraic methods. However, one can still 



solve those equations numerically: For /i on the more general 
form d20t . the ^-transform is 



Vn(z) 



\- Pi 



z -'l + zX l 

l—l 



Putting this into < [37] >, we see that we can solve 

n 

z2 1~, — m — \ r~v\ = v^A 2 ) 

^ 1 + zX, (1 - c + c?7 m b Mc (z)) 

to find r)^^ a (z). Collecting terms, we see that this is a higher 
order equation in rjaSu, c (z)- tti^z) and hence the density of 
\i can then be found from 



V. Applications to signal processing 

In this section, we provide several applications of free 
deconvolution and show how the framework can be used in 
this paper. 

A. Estimation of power and the number of users 

In communication applications, one needs to determine the 
number of users in a cell in a CDMA type network as well the 
power with which they are received (linked to the path loss). 
Denoting by n the spreading length, the received vector at the 
base station in an uplink CDMA system is given by: 



WP5 S , 



(40) 



where y,, W, P, Sj and are respectively the n X 1 
received vector, the n x N spreading matrix with i.i.d zero 
mean, — variance entries, the N x N diagonal power matrix, 
the N x 1 i.i.d gaussian unit variance modulation signals and 
the nx 1 additive white zero mean Gaussian noise. 

Usual methods determine the power of the users by finding 
the eigenvalues of covariance matrix of y; when the signatures 
(matrix W) and the noise variance are known. 



© = E (y 4 yf ) = WPW ff + a 2 l 



(41) 



However, in practice, one has only access to an estimate 
of the covariance matrix and does not know the signatures of 
the users. One can solely assume the noise variance known. 
In fact, usual methods compute the sample covariance matrix 
(based on L samples) given by: 



(42) 



and determine the number of users (and not the powers) 
in the cell by the non zero-eigenvalues (or up to an ad-hoc 
threshold for the noise variance) of: 







r 2 I 



(43) 



This method, referred here as classical method, is quite 
inadequate when L is in the same range as n. Moreover, it 
does not provide a method for the estimation of the power of 
the users. 

The free deconvolution framework introduced in this paper 
is well suited for this case and enables to determine the power 
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of the users without knowing their specific code structure. 
Indeed, the sample covariance matrix is related to the true o 
covariance matrix = E (yiyf) by: 



with 



= WPW 



H 



Vi 



(44) 



(45) 



and X is a n x L i.i.d Gaussian zero mean matrix. 

Combining (©, @3]>, with the fact that W H W, iXX H 
are Wishart matrices with distributions approaching /i™ 
respectively, and using that 



_ N 

A*wpw h — — Mw h wp 

n 



1 - - Uo, 

n 



we get due to asymptotic freeness the equation 



N ( N 
— Gujv H//p) +1 ) do 



Mr 
(46) 

If one knows the noise variance, one can use this equation in 
simulations in two ways: 

1) Through additive and multiplicative free deconvolution, 
use (|46| > where the power distribution of the users (and 
de facto the number of users) is expressed in terms of 
the sample covariance matrices. 

2) Determine the numbers of users N through a best-match 
procedure: Try all values of N with 1 < N < n, and 
choose the N which gives a best match between the left 
and right hand side in (l46l i. 

To solve d46b . method A was used to compute the moments. 
In order to solve ( Tol l, we also need to compute additive 
free deconvolution with a scalar. This was addressed in sec- 
tion IIV-A.1I but can also be computed in a simpler way, since 

0((a - a 2 lY) = (k) a 2k <t>{a?- k ). 

k=0 ^ ' 

In ( l46b we also scale a measure with — , and add an atom at 
0. Both of these cases are easily implemented. 

In the following simulations, a spreading length of n = 256 
and noise variance a 2 = 0.1 have been used. 

1) Estimation of power: We use a 36 x 36 (N = 36) 
diagonal matrix as our power matrix P, and use three sets 
of values, at 0.5, 1 and 1.5 with equal probability, so that 



L 1, 1. 

MP = 7j d 0.5 + gOl + 3<>l-5- 



(47) 



There are no existing methods for estimating such a /ip from 
the sample covariance matrices: to our knowledge, existing 
methods estimate the power with non-zero eigenvalues of the 
sample covariance matrix up to a 2 . In our case, the powers 
are all above a 2 . 

In figure [7j the CDF of /ip was estimated by solving 
d46i >. using method A with three moments. The resulting 
moments from method A were used to compute estimates of 
the eigenvalues through the Newton-Girard formula, and the 
CDF was computed by averaging these eigenvalues for 100 




Power 

(a) L = 256 



5 1 
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(b) L = 512 
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Fig. 7. CDF of powers estimated from multiplicative free deconvolution 
from sample covariance matrices with different number of observations. 
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Fig. 8. Distribution of the powers estimated from multiplicative free 
deconvolution from sample covariance matrices with L = 2048. 



runs for each number of observations. An alternative strategy 
would be to use higher moments, and less runs for each 
observation. As one can see, when L increases, we get a CDF 
closer to that of d47T >. The best result is obtained for L = 2048. 
The corresponding histogram of the eigenvalues in this case 
is shown in figure [8] 

2) Estimation of the number of users: We use a 36 x 36 
(N = 36) diagonal matrix as our power matrix P with /ip = 
6t. In this case, a common method that try to find just the 
rank exists. This method determines the number of eigenvalues 
greater than a 2 . Some threshold is used in this process. We 
will set the threshold at 1.5<7 2 , so that only eigenvalues larger 
that 1.5a 2 are counted. There are no general known rules for 
where the threshold should be set, so some guessing is inherent 
in this method. Also, choosing a wrong threshold can lead to 
a need for a very high number of observations for the method 
to be precise. 

We will compare this classical method with a free convolu- 
tion method for estimating the rank, following the procedure 
sketched in step 2). The method is tested with varying number 
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Fig. 9. Estimation of the number of users with a classical method, and free 
convolution L = 1024 observations have been used. 



of observations, from L = 1 to L = 4000, and the N which 
gives the best match with the moments of the SCM in (|46] > 
is chosen. Only the four first moments are considered. In 
figure [9] it is seen that when L increases, we get a prediction 
of N which is closer to the actual value 36. The classical 
method starts to predict values close to the right one only for 
a number of observations close to 4000. The method using 
free probability predicts values close to the right one for a 
less greater number of realizations. 

B. Estimation of Channel correlation 

In channel modelling, the modeler would like to infer on the 
correlation between the different degrees of the channel. These 
typical cases are represented by a received signal (assuming 
that a unit training sequence has been sent) which is given by 



y» = Wj 



(48) 



where y,, w, and are respectively the n x 1 received 
vector, the n x 1 zero Gaussian impulse response and n x 1 
additive white zero mean Gaussian noise with variance a. The 
cases of interest can be: 

• Ultra-wide band applications [29], [30], [31], [32] where 
one measures in the frequency domain the wide-band 
nature of the frequency signature w s ; 

• Multiple antenna applications [1], [33] with one transmit 
and n receiving antennas where w., is the spatial channel 
signature at time instant i. 

Usual methods compute the sample covariance matrix given 
by: 



R 



1 



L 

^ yiy? 



L ^ 

»=i 



The sample covariance matrix is related to the true covari- 
ance matrix of w, by: 



R = 05XX H 05 




-1 1 

(b) L = 512 



Fig. 10. CDF of eigenvalues estimated from multiplicative free deconvolution 
from sample covariance matrices with different number of observations. 



with 



= R 



(50) 



and X is an N x n i.i.d Gaussian zero mean matrix. 

Hence, if one knows the noise variance (measured without 
any signal sent), one can determine the eigenvalue distribution 
of the true covariance matrix following: 



(51) 



According to theorem |2j computing fi-^Ellia is the same as 
computing the GVestimator for covariance matrices. Additive 
free deconvolution with /i CT 2/ is the same as performing a shift 
of the spectrum to the left. 

We use a rank K covariance matrix of the form R = 
diag[l, 1, .., 1, 0, .., 0], and variance a 2 = 0.1, so that a ~ 
0.3162. For simulation purposes, L vectors w, with covariance 
R have been generated with n = 256 and K = 128. We would 
like to observe the p.d.f. 



(52) 



(49) 



in our simulations. 

In figure [lOl ( BTT l has been solved, using L = 128 and 
L = 512 observations, respectively. The same strategy as in 
section W-A\ was used, i.e. the CDF was produced by averaging 
eigenvalues from 100 runs. 4 moments were computed. Both 
cases suggest a p.d.f. close to that of ( f52b . It is seen that the 
number of observations need not be higher than the dimensions 
of the systems in order for free deconvolution to work. 

The second case corresponds to c = 0.5, so that when there 
is no noise (a = 0), the sample covariance is approximately 
+ !<5i)KI/xi, which is shown in figure|6]a). If it is known 
that the covariance has the density (1 — p)5o+pS\, theorem [3] 
can be used, so that we only need to read the maximum density 
and the location parameter in order to find p and A. 

It may also be that the true covariance matrix is known, 
and that we would like to estimate the noise variance through 
a limited number of observations. In figure [TT] L = 128 and 
L = 512 observations have been taken. In accordance with 
([STT l. we compute (urEB/^/")!^™ for a set of noise variance 
candidates if, and an MSE of the four first moments of this 
with the moments of the observed sample covariance matrix is 
computed. Values of 77 in (ct-0.1, cr+0.1) - (0.2162,0.4162) 
have been tested, with a spacing of 0.001. It is seen that the 
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0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 



Indeed, the MIMO measured channel is given by: 
H, = ^(H + aX I ) 



(53) 



where H^, H and Xj are respectively the n x n measured 
MIMO matrix (n is the number of receiving and transmitting 
antennas), the n x n MIMO channel and the n x n noise 
matrix with i.i.d zero mean unit variance Gaussian entries. 
We suppose that the channel stays constant (block fading 
assumption) during L blocks. In this case, the observed model 
becomes: 



H 



1...L 



= —j= ( Hi...l 



+ 7r Xl - 



(54) 



with 



Fig. 11. Estimation of the noise variance, 
observations have been used. 



128 and L 



512 



Hi.L — 



VI 



Hi, H 2 , H/ 



(55) 



MMSE occurs close to the value a = V0J = 0.3162, even 
if the number of observations is smaller than the rank. The 
MMSE occurs closer to a for L = 512 than for L = 128, 
so the estimate of sigma improves slightly with L. It is also 
seen that the MSE curve for L = 512 lies lower than the 
MSE curve for L = 128. An explanation for this lies in the 
free convolution with /j,n; As L — > oo, this has the effect of 
concentrating all energy at 1. 

VI. Further work 

In this work, we have only touched upon a fraction of the 
potential of free deconvolution in the field of signal processing. 
The framework is well adapted for any problem where one 
needs to infer on one of the mixing matrices. Moreover, 
tools were developed to practically deconvolve the measures 
numerically and where shown to be simple to implement. 
Interestingly, although the results are valid in the asymptotic 
case, the work presented in this paper shows that it is well 
suited for sizes of interest for signal processing applications. 
The examples draw upon some basic wireless communications 
problems but can be extended to other cases. In particular, 
classical blind methods [34], [35], [36] which assume an 
infinite number of observations or noisyless problems can be 
revisited in light of the results of this paper. The work can 
also be extended in several directions and should bring new 
insights on the potential of free deconvolution. 

A. Other applications to signal processing 

There are many other examples that could be considered 
in this paper. Due to limitations, we have detailed only two. 
For example, another case of interest can be the estimation of 
channel capacity. In usual measurement methods, one validates 
models [37], [38], [39], [40] by determining how the model 
fits with actual capacity measurements. In this setting, one has 
to be extremely cautious about the measurement noise. 



Hi 



VI 



[H,H,...,H] 



Xi...l — [Xi, x 2 



(56) 



(57) 



The capacity of a channel with channel matrix H and signal 
to noise ratio p — is given by 



C 



-logdet (l+-^HH H 



n 



(58) 
(59) 



where A; are the eigenvalues of iHH ff . The problem 
consists therefore of estimating the eigenvalues of iHH fl 
based on few observations of Hi. For a single observation, 
This can be done through the approximation 

MhiH" Q Mi = (i«Ihh« SA*i) ffl Ma- (60) 
If we have many observations, we have that: 

MH L i Hf = (m±hh»HMi) ffl M^=- (61) 

B. Other types of sample matrices 

One topic of interest is the use of free deconvolution with 
other types of matrices than the sample covariance matrix. In 
fact, based on a given set of observations, one can construct 
higher sample moment matrices than the sample covariance 
matrix (third product matrix for example). These matrices 
contain useful information that could be used in the problem. 
The main difficult issue here is to prove freeness of the 
convolved measures. The free deconvolution framework could 
also be applied to tensor problems [41] and this has not been 
considered yet to our knowledge. 
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C. Colored Noise 

In this work, the noise considered was supposed to be 
temporally and spatially white with standard Gaussian entries. 
This yields the Marchenko pastur law as the operand measure. 
However, the analysis can be extended, with the assumption 
that freeness is proved, to other type of noises: the case for 
example of an additive noise with a given correlation. In this 
case, the operand measure is not the Marchenko pastur law 
but depends on the limiting distribution of the sample noise 
covariance matrix. Although the mathematical formulation 
turns out to be identical, in terms of implementation, the 
problem is more complicated as one has to use more involved 
power series than the Zeta-senes for example. 

D. Parametrized distribution 

In the previous example (signal impaired with noise), the 
Marchenko Pastur law /_t c was one of the operand measures, 
while the other was either estimated or considered to be a 
discrete measure, i.e. with density 

n 

/"(*)= 5>i<Ms). (62) 
i=l 

It turns out that one can find also the parameterized distri- 
bution (best fit by adjusting the parameter) that deconvolves 
up to certain minimum mean square error. For example, one 
could approximate the measure of interest with two diracs 
(instead of the set of n diracs) and find the best set of 
diracs that minimizes the mean square error. One can also 
approximate the measure with the Marchenko pastur law for 
which the parameter c needs to be optimized. In both cases, 
the interesting point is that the expressions can be derived 
explicitly. 



1) Form the vector m — (1, of length n+1, and 
compute and store recursively the n vectors 

M\ = m, M2 = m* m,..., M n = * n m, 

where *„ stands for n-fold (classical) convolution with 
itself. The later steps in the algorithm use only the 
n + 1 first elements of the vectors Mi, M%,...,M n . 
Consequently, the full M& vectors are not needed for 
all k: We can truncate M^ to the first n + 1 elements 
after each convolution, so that the implementation can 
be made quite efficient. 

2) Calculate the cumulants recursively. If the first n — 1 
cumulants, i.e. the first an in d24t . have been found by 
solving the n — 1 first equations in ( l24b . a n can then 
be found through the nth equation in ( l24b . by using 
the vectors computed in step 1). More precisely ,the 
connection between the vectors in 1) and the value we 
use in J24b is 

coef n - k ((l + mz + H2Z 2 + ...) fe ) = M/.(n - k), 

where n — k denotes the index in the vector (starting 
from 0). Finding the fc'th cumulant a% by solving the 
kth equation in d24l i is the same as 

_ Mi(n + 1) - Ei< r <^i a r M r (k - r) 

ak mo) • 

The program for computing moments from cumulants is 
slightly more complex, since we can't start out by computing 
the vectors M\,...,M n separately at the beginning, since the 
moments are used to form them (these are not known yet). 
Instead, elements in Mi,...,M„ are added each time a new 
moment has been computed. 



VII. Conclusion 

In this paper, we have shown that free probability provides 
a neat framework for estimation problems when the number 
of observations is of the same order as the dimensions of the 
problem. In particular, we have introduced a free deconvo- 
lution framework (both additive and multiplicative) which is 
very appealing from a mathematical point of view and provides 
an intuitive understanding of some G-estimators provided by 
Girko [11]. Moreover, implementation aspects were discussed 
and proved to be adapted, through simulations, to classical 
signal processing applications without the need of infinite 
dimensions. 

Appendix I 

Algorithm for computing free convolution 

The files momcum . m and cummom . m in [25] are implemen- 
tations of d24l i in MATLAB. The first calculates cumulants 
from moments, the second moments from cumulants. Both 
programs are rather short, and both take a series of moments 
(/ii, ■■■,fJ, n ) as input. The algorithm for computing cumulants 
from moments goes the following way: 



Appendix II 
The proof of theoremO 

Set rj(z) — 7]n^n c . From (|37l i we see that we must solve 
the equation 

{z{l-c + aq{z))) =r)(z). 

Substituting ( |39l l, multiplying and collecting terms, we get that 
77(2) must be a zero for the equation 

cz\r](z) 2 + (l + z\(l - 2c + cp)))j](z)-(l-p)(l-c)z\-l. 

The analytical continuation of m(z) — m^„ c (z) to the neg- 

ative part of the real line satisfies 77(2) = — \ z ■ Subsituting 
this and also substituting u = — i, we get that 

-c\zm(z) 2 + (\(l-2c + cp)-z)m(z) + -\(l~p)(l~c)-l 

(63) 

equals for z which are real and negative. It is clear that 
any analytical continuation of m(z) to the upper half of the 
complex plane also must satisfy d63l . We use the formula for 
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the solution of the second degree equation and get that m(z) 
equals 



-A(l-2c+cp)+z±, 



(A(l - 2c + cp) - z) 2 
\ +4cA 2 (l -p)(l -c) -AcXz 



-2c\z 



(A(l-2c+cp)+2=F 



z 2 - 2A(1 + cp)z 
+4cA 2 (l-c)(l-p) 
\l +X 2 (l-2c + cp) 2 



(64) 



2c\z 



The zeroes of the discriminant here are 



2(A(l+cp)± A /A 2 4(l+cp) 2 -16cA 2 (l-c)(l-p)-4A 2 (l-2c+cp) 2 
A(l + cp) 

x / 4(l + cj-) 2 -16c(l-c)(l-p) 
2 V -4(1 + cp) 2 - 16c 2 + 16c(l + cp) 



= A(l + cp) ± \\y/lQc(l + cp-c-(l- c)(l - p)) 
= A(l + cp) ± 2X^cp, 

This means that we can rewrite m(z) to 



\(l-2c + cp) + zT 



(z - A(l + cp) + 2X^/cp) 
(z - A(l + cp) - 2\^p) 



2c\z 

Thus, for z real, m(z) is complex if and only if z lies in 
the interval I\,c,p of ( f3Tb - Outside I\, c , p , the density of /i Kl 
;Lt c is zero. Taking the imaginary part and using the Stieltjes 
inversion formula, we get that the density in I\ <c ,p is given by 
the formula (l32l . 

Setting the derivative of (|32l w.r.t. z equal to z gives us 
a first degree equation which yields a unique maximum at 



l+cp 



After some more calculations, we get that the 



density at this extremal point is c ^^(J^_ cp ^ ■ This finishes the 
proof. ■ 



Appendix III 
The proof of theorem 0] 

Set rj(z) = T^rg „ c , we see from ( |38l l that we must solve 
the equation 



1 — C + CT)(z) 



= ri{z). 



Substituting d39] l, multiplying and collecting terms, we get that 
rj(z) must be a zero for the equation 

cq(z) 2 + (1 - 2c + zX)ri(z) - (1 - c) - zA(l - p). 

Using the formula for the solution of the second degree 
equation, one can see that the positive square root must be 
chosen whenever c < \, since r\ assumes positive positive 

values whenever z > 0. Substituting r/(z) = — \ z and also 
u = — - as in the case for convolution, we get that 



cz 2 m(z) 2 + (A - z(l - 2c))m(z) + -A(l - p) - (1 - c) 

z 



equals for z which are real and negative. We get that m(z) 
equals 



-A+z(l-2c)±, 



(A-z(l-2c)) 2 
S\ -4cz 2 (lA(l-p)-(l-c)) 



(65) 



-A+z(l-2c)±^/z 2 -2A(l-2cp)z+A 2 
— 2cP ' 

The zeroes of the discriminant are 



2A(l-2cp)±- v /4A 2 (l-2cp) 2 -4A 2 

= A(l - 2cp) ±\y/ AX 2 (Ac 2 p 2 - Acp) 2 
= A(l - 2cp) ± 2Xy/cp{l - cp). 

Following the same reasoning as for convolution, we see that 
the density is outside the interval Ja, c ,p of d33l ). and that the 
density in J\. c . p is given by d34l . This finishes the proof. ^ 
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